Unsupervised distance metric learning using predictability
نویسندگان
چکیده
Distance-based learning methods, like clustering and SVMs, are dependent on good distance metrics. This paper does unsupervised metric learning in the context of clustering. We seek transformations of data which give clean and well separated clusters where clean clusters are those for which membership can be accurately predicted. The transformation (hence distance metric) is obtained by minimizing the blur ratio, which is defined as the ratio of the within cluster variance divided by the total data variance in the transformed space. For minimization we propose an iterative procedure, Clustering Predictions of Cluster Membership (CPCM). CPCM alternately (a) predicts cluster memberships (e.g., using linear regression) and (b) clusters these predictions (e.g., using k-means). With linear regression and k-means, this algorithm is guaranteed to converge to a fixed point. The resulting clusters are invariant to linear transformations of original features, and tend to eliminate noise features by driving their weights to zero. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-08-23. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/885 Unsupervised distance metric learning using predictability Abhishek A. Gupta Department of Statistics University of Pennsylvania [email protected] Dean P. Foster Department of Statistics University of Pennsylvania [email protected] Lyle H. Ungar Department of Computer and Information Science University of Pennsylvania [email protected]
منابع مشابه
یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملSome Research Problems in Metric Learning and Manifold Learning
In the past few years, metric learning, semi-supervised learning, and manifold learning methods have aroused a great deal of interest in the machine learning community. Many machine learning and pattern recognition algorithms rely on a distance metric. Instead of choosing the metric manually, a promising approach is to learn the metric from data automatically. Besides some early work on metric ...
متن کاملDistance Metric Learning: A Comprehensive Survey
Many machine learning algorithms, such as K Nearest Neighbor (KNN), heavily rely on the distance metric for the input data patterns. Distance Metric learning is to learn a distance metric for the input space of data from a given collection of pair of similar/dissimilar points that preserves the distance relation among the training data. In recent years, many studies have demonstrated, both empi...
متن کاملMetric learning for unsupervised phoneme segmentation
Unsupervised phoneme segmentation aims at dividing a speech stream into phonemes without using any prior knowledge of linguistic contents and acoustic models. In [1], we formulated this problem into an optimization framework, and developed an objective function, summation of squared error (SSE) based on the Euclidean distance of cepstral features. However, it is unknown whether or not Euclidean...
متن کاملAn Overview of Distance Metric Learning
In our previous comprehensive survey [41], we have categorized the disparate issues in distance metric learning. Within each of the four categories, we have summarized existing work, disclosed their essential connections, strengths and weaknesses. The first category is supervised distance metric learning, which contains supervised global distance metric learning, local adaptive supervised dista...
متن کامل